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Article history: Well-being sleep is a significant segment for maintaining mental comparably 
as genuine flourishing. More than six-hour recordings are required to 
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Accepted Jan 11, 2022 readings. As a result, automated PC-based assessment is expected to detect 


abnormalities as early as possible. An automated framework for the 
classification of obstructive sleep apnea (OSA) can moreover be distinguished 
Keywords: from the ECG Signals. From the Massachusetts Institute of Technology-Beth 
Israel Hospital (MIT-BIH) polysomnographic informational collection. 
18 subjects have been considered as data signals. The signal is segmented into 
30 seconds and features are extracted by using the discrete wavelet transform 


Discrete wavelets transform 
random forest 


Obstructive sleep apnea (DWT). DWT of seven-level decomposition is applied on the segmented 
electrocardiogram signal by using the wavelet 'sym3'. 12 features were extracted from each 
Support vector machine level and all of them are used to categorize the five types of sleep apnea. 


Random forest, k-nearest neighbor (KNN), and support vector machine 
(SVM) are used for classification of apnea. The random forest (RF) 
classifier outperformed the others with an average of accuracy (Acc) of 
98.86% according to the study's findings. The experimental results show the 
developed model outperforms the state of art algorithms in the literature. 
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1. INTRODUCTION 

Sleep assumes a significant job in human wellbeing. There is a development in time and cost of the 
unsafe results of poor sleep quality. Poor rest quality typically interfaces with physical and emotional wellness 
intricacies [1]. These apnea problems can have genuine and life-shortening results. Obstructive sleep apnea 
(OSA) is a condition in which the superior aeronautics closes temporarily during sleep, preventing air from 
inflowing the lungs [2]. When breathing does not yet come to a halt, the amount of air entering the lungs with 
each breath is reduced. The respiratory episode is recognised as hypopnea [3]. Different approaches for the 
detection of sleep apnea from electrocardiogram (ECG) signal have been proposed by researchers. The 
approaches start with data collecting from the Massachusetts Institute of Technology-Beth Israel Hospital (MIT- 
BIH) apnea database, data pre-processing, extraction of features from ECG signal and classification of features. 

The variational mode decomposition is used to separate ECG into multiple modes, which are then 
utilised to extract various features and fed to the k-nearest neighbor classifier [4]. Different machine learning 
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techniques are used for detection of sleep apnea like ANN [5]-[7], SVM [8]-[11], orthogonal wavelet filter 
bank [12]. Nishad et al. [13] uses ECG signal to identify apnea and non-apnea events. The ECG signal is 
decomposed into sub bands using wavelet transform, extracted features from each sub band and given the 
extracted features to different classifiers [13], [14]. A hidden Markov model is used for OSA detection that 
considers temporal dependence within segmented ECG signals [15]. The normal inverse Gaussian parameters 
for each sub-bands of ECG segments are computed, and adaptive boosting is used for detection of SA [16]. The 
beat-by-beat power spectral density of HRV and R peak area were evaluated using a bivariate autoregressive 
model. On a minute-by-minute basis for each recording, a k-nearest neighbor (KNN) was used to categorise 
apnea occurrences from normal ones [17]. ECG signals are commonly used to diagnose heart problems and 
sleep-related disorders. Various sources of noise within the signal's frequency band typically contaminate the 
recorded ECG signal, altering its properties and making it difficult to extract useable information from 
it [18]-[22]. The characteristics of the ECG signal are critical in identifying heart disorders [23]-[25] and sleep 
disorders. It is stated from the literature that most of the authors considered only two classes i.e., sleep apnea 
event and non-sleep apnea event. In our work, we have classified five types of slepp apnea (SA). 

The objective of the project is to provide a computer-based solution for identifying sleep apnea using 
the machine learning algorithms. This research is targeted to achieve more accurate results compared to 
diagnosing with human interference. The general scope of this project is to identify five types of sleep apneas: 
Central apnea with arousal (CAA), obstructive apnea (OA), obstructive apnea with arousal (X), hypoapnea with 
arousal (HA), and normal (N) using available free MIT-BIH Polysomnographic. The features are extracted from 
the statistical analysis of discrete wavelet coefficients of collected ECG signal. The extracted features are given 
to the support vector machine (SVM), KNN and random forest for the classification of SA. 


2. METHOD 
2.1. Dataset 

This data set was taken from the physionet by the Boston Beth Israel Hospital. The information 
utilized in this work taken from the MIT-BIH Polysomnographic database. This information base 
incorporates 18 subjects, where the individual record incorporates over 80 hours of ECG accounts and 
remarks on sleep stage and apnea labels. Every one of these signals was handled and examined at 250 Hz. 
The ECG signal is available in the form of 30 seconds segments and each segment is labelled. There is an 
aggregate of 16 classes of sleep stage and apnea labels given in the data set, of which five sorts are 
considered for order in this examination as showed in Figure 1. We have extracted only these five-class 
labelled 30 seconds segments from each record and framed the data set. Figure 2 shows the normal ECG 
signal and the OSA issue in the ECG signal. The considered ECG signals are noise free. If the signal contains 
noise, then different noise removal algorithms will be implemented to remove the noise. 


MIT-BIH Polysomnographic Feature Classification ofSleep 
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Figure 2. ECG signal 
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2.2. Feature extraction 

The following phase is to extract the features from ECG signals. Features address required 
characterization data for the signal. Later these separated features are utilized as the contributions to the 
classifiers. 18 records have been chosen from the MIT-BIH polysomnographic information base, with each 
signal portioned into 30 seconds. Wavelet transform works well with nonstationary signals like ECGs. It starts 
with a larger window to understand the major features, then moves on to a smaller window to recognise the 
smaller features. For tiny values of frequency, the wavelet transform has a high resolution in the frequency 
domain but a lower resolution in the time domain. However, it has a high resolution in the time domain at high 
frequencies but a low resolution in the frequency domain. This is the main reason wavelets are selected for 
feature extraction. To separate the features, a DWT of a 30-second epoch signal is used. The wavelet 'sym3' was 
used to complete a seven-level decomposition. The wavelet transform divides a signal into a set of approximate 
(Aj) and detailed (Dj) coefficients of level j=1, 2, ...7. Mean, correlation coefficient, standard deviation, 
skewness, variance, kurtosis, Shannon entropy, RMS, median, minimum wavelet coefficient, maximum wavelet 
coefficient and Harmonic mean were all considered in this work for the classification of apnea. For 
classification purposes, 12 features were collected from each level. The features are: 
- Mean: It is defined as the average of the number of observations (N). The mean formula is given in (1). 


1 
w= < Er=04X (1) 
- Variance: It is determined by taking the normal squared deviations from the mean. Variance is given in (2): 
1 
V = z Èx Ax — el? (2) 
where u is the mean (2). 
— lyn 
H= Wy Uux=0 Ax 
- Standard deviation: It is the square root of the variance, and it is given in the formula as (3). 


a=W (3) 


- Shannon entropy: Shannon entropy, otherwise called information entropy or the Shannon entropy record, 
is a proportion of the level of haphazardness in a bunch of information and it is given as (4). 


HD = XP PÒ. log P(Ù) 
= XF PW). log) (4) 


- Skewness: The skewness of a real-valued random variable's probability distribution around its mean is a 
measure of its asymmetry. It is given as: 


s = E[t—ul?/o* i 


here u is the mean of t, o is the standard deviation of t. 
- Kurtosis: It is an evaluation of how outlier-prone a distribution and it is given as (6): 


s = E[t - p]*/o* 9 


where y is the mean of t, o is the standard deviation of t. 
- RMS: RMS is the root-mean-square value of a signal, and it is given as (7): 


Xrus= |EN Ital? (7) 


- Median: It separates the higher half from the lower half of an information test, and it is given as: 


xiz] 
Med(x) = ey (8) 


2 


where X= set of values in the data set and n=number of values. 
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- Harmonic mean: The harmonic mean is frequently used to figure the normal of the proportions or rates. It 
is the most fitting measure for proportions and rates since it adjusts loads of every information point, and 
it is given in (9): 
n 
Me Sat (9) 
Diniz 
where n is the number of values in x. 
- Correlation coefficient: It is characterized as the relationship coefficient of two irregular factors is a 
proportion of their straight reliance and it is given in (10): 


ptavay= Amn, (2) (#2) a0 


o o 


where y is the mean and o is the standard deviation. 


2.3. Classification 
In this work, machine learning algorithms like k-nearest neighbour, support vector machine and 
random forest are implemented to identify the five kinds of sleep apnea. Support vector machines are a type 
of coordinated learning algorithm that can be used for both classification and regression. The objective is to 
find a hyperplane in n-dimensional space. 
a. Support Vectors: Data focuses that are closest to the hyperplane are called support vectors. A segregating 
line will be portrayed with the help of these data focuses. 
b. Hyperplane: a decision plane or space which is parted between lots of different classes. 
c. Margin: It may be described as the gap between the two lines on the nearest data points of different 
classes. It might be resolved as the contrary partition from the line to the help vectors. 
The key target of SVM is to separate the datasets into classes to find a maximum marginal hyperplane 
(MMH) and it will in general be done in the going with two phases: i) First, SVM will make hyperplanes 
iteratively that seclude the classes in the best way; and ii) Then, it will pick the hyperplane that disengages 
the classes successfully. 


2.3.1. K-Nearest Neighbors (KNN) 

KNN relies upon the k-number of closest neighbours. KNN is a non-parametric learning calculation 
and doesnt have any kind of assumptions regarding the info informational index and its appropriation. 
Whenever a data sample is to be categorized, the distances are figured, and afterward relying on the nearest 
points, the given data sample is predicted. The mathematical expression of distance measures is given as (11). 


Euclidean Distance: d(x,y) = />™@,( a; — bi )? (11) 


2.3.2. Algorithm of KNN 

The whole process of KNN is explained as: i) Consider k number of nearest neighbors; 11) Euclidean 
Distance is calculated between k numbers of neighbors; iii) Calculate the data points among these k 
neighbors in each category; iv) The group of nearest neighbors is grouped; and v) The majority voting is used 
as the prediction of the given sample. 


2.3.3. Random forest (RF) 

With a bagging method, Random Forest creates classification trees based on randomly selected 
attributes of randomly selected samples. There is a direct correlation between the number of trees in the 
forest and the types of outcomes it can provide. Overfitting is a problem with many algorithms that can make 
the results worse, but the RF classifier will not overfit the model. Missing values can be handled with RF. 
Different trees are developed and during the checking, the votes from each tree are gathered and the target is 
by the predominance of votes. 


2.3.4. Algorithm of RF 

The whole process of RF is explained as: i) K-features are selected randomly from the training data 
set; ii) The decision tree is build based on the selected data points; iii) Select the number M to build the 
decision tree; iv) Steps i to ii are repeated; and v) Repeat steps i-iv for an n-number of times to generate an n- 
number of trees, and the forest is complete. 


Automated sleep apnea classification based on statistical and spectral ... (Lavu Venkata Rajani Kumari) 


1454 O ISSN: 2502-4752 


2.4. Performance measures 

The following five performance parameters are utilized in this work: accuracy, sensitivity, 
specificity, precision, and F-score. The following performance parameters are computed from the confusion 
matrix. 
- Accuracy: It's the number of successfully classified values divided by the total number of values. 


TP+TN 


Accuracy = ———__—— 
Y  TPeTN+FN+FP 


(12) 


- Precision: It is explained as the ratio of correct values to the total number of positive and false-positive 
values. 


TP 


Precision = —— 
TP+FP 


(13) 


- Sensitivity: Recall or Sensitivity is being explained as the ratio of correctly classified values to the sum of 
positive and wrongly detected values. 


TP 


Sensitivity = aN (14) 
- Specificity: The fraction of no. of true negative predictions to total no. of negatives. 
ae TN 
Specificity = NFFP (15) 
- F-Score: The harmonic mean of precision and recall is the F-Score. 
= = precision+sensitivity | 
F — Score =2 omen sensitivity. (16) 


True positive (TP): Total number of correctly detected beats. False Positive (FP): Total number of missed 
beats. False negative (FN): Total number of wrongly detected. True Negative (TN): Total no of accurate 
negatively detected beats. 


3. RESULTS AND DISCUSSION 

Table 1 shows that the 18 signals result in a total of 9229 epochs. Total 96 features are extracted 
from each epoch (9229X96). There were 178 epochs of central apnea with arousal (CAA), 600 epochs of 
obstructive apnea (OA), 625 epochs of obstructive apnea with arousal (X), 740 epochs of hypoapnea with 
arousal (HA) arousal, and 7086 epochs of normal (N). As a result, the complete feature set is separated into a 
training dataset (80%) and a testing dataset (20%) and distributed to the different classifiers. The Table 2 
shows the correct classification and misclassification details. 


Table 1. Total training and testing data set 
S. No OSA Data Set Training Data Testing Data 


1 CAA 178 143 35 
2 OA 600 480 120 
3 X 625 500 125 
4 HA 740 592 148 
5 N 7086 5668 1418 
TOTAL 9229 7386 1846 


Table 2. Confusion matrix of three classifiers 
Confusion matrix of SVM Confusion matrix of KNN Confusion matrix of RF 


Sleep Apneas N HA X OA CAA N HA X OA CAA N HA X OA CAA 
CAA 0 I 0 O 34 0 nn) 34 0 0 oO 0 35 
OA 0 0 0 120 0 0 0 0 120 2 0 0 0 1200 0 
x 0 54 53 0 18 0 4 70 47 1 0 7 105 0 13 
HA 0 146 0 0 2 0 146 0 0 2 0 147 0 o0 1 
N 14188 0 0 0 0 148 0 0 0 0 1418 0 0 0 0 
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The performance measures are computed using the confusion matrix of the classifiers and are listed 
in Table 3 for each sleep apnea. The average accuracy of SVM classifier is 95.93%, KNN classifier is 
96.85% and the Random Forest classifier is 98.86%. Figure 3 shows the comparison of performance 
parameters of all classifiers. 


Table 3. Performance parameters of classifiers 


Sleep Apnea SVM Classifier KNN classifier RF classifier 
Se% Sp% P% Fs% Se% Sp% P% Fs% Se% Sp% P% Fs% 
N 100 100 100 100 100 100 100 100 100 100 100 100 
HA 98.64 96.72 72.63 83.66 98.64 99.7 96.69 98.17 99.32 99.58 95.45 97.47 
X 42.5 100 100 59.64 57.37 100 100 100 84 100 100 100 
OA 100 100 100 100 98.36 97.26 81.63 88.76 100 100 100 100 
CAA 97.14 99.94 62.96 764 97.14 99.72 87.17 93.02 100 99.22 64.15 77.92 
Avg 87.65 99.33 87.12 83.94 90.3 99.34 93.1 95.99 96.66 99.76 91.92 95.08 
105 4 
100 - 
b s e 
; 90 Y =SVM 
g 85 + Y KNN 
80 Y “RF 
75 r s 


Acc% Se% Sp% P% 


Performance Parametres 


Fs% 


Figure 3. Performance comparison of classifiers 


3.1. Comparison with the literature 

Throughout the most recent couple of years, different analysts have proposed new strategies, 
methods that have been created for the recognizable proof of obstructive sleep apnea disorder. For the most 
part, the data sets considered are from Physio net and from different emergency clinics, which is compared 
with the proposed method as shown in Table 4. The performance of the Random Forest classifier with the 
chosen features provides better performance than the existing algorithms 


Table 4. Comparison with the existing literature 


Performance results 


Authors Data input Features Classifiers Acc % Se % Sp % 
Ahsan et al. [27] Institute of breathing Events of Two-staged 95 92 99 
and sleep Austin Hypopnea and feed-forward 
Hospital DWT NN 
Majdi et al. [23] Apnea ECG-Physionet Time and spectral SVM 96 - - 
domain 
Lin et al. [24] MIT-BIH Wavelet transform ANN - 70 45 
database Physionet 
Carolina et al. [25] Apnea ECG-Physionet Wavelet and HRV Threshold 85 85 85 
and KU Leuven sleep 
lab 
Serein et al. [9] MIT database Wavelet packet Linear SVM 93.3 90 100 
decomposition of 
HRV 
Sunil et al. [10] Apnea ECG-Physionet Gabor filter Least square 93.3 - - 
responses SVM 
Proposed work MIT-BIH Wavelet features Random 98.86 96.66 99.76 
Polysomnographic Forest 


4. CONCLUSION 


Automated sleep apnea classification models have been developed to classify five types of sleep 


apnea. ECG signals are collected from MIT-BIH Polysomnographic database, and the feature extraction is 
based on the DWT. Totally 12 features are extracted from each level and given as inputs to different 
classifiers. By comparing with various classifiers, random forest classifiers have shown better 98.86% 
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accuracy, 96.66% sensitivity, 99.76% specificity, 91.92% of precision, and an overall F-score of 95.08%, in 
detecting and classifying the sleep apnea from ECG signals. 
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